Shrinking Trees

نویسندگان

  • Trevor Hastie
  • Daryl Pregibon
چکیده

Tree-based models provide an alternative to linear models for classification and regression data. They are used primarily for exploratory analysis of complex data or as a diagnostic tool following a linear model analysis. They are also used as the end product in certain applications, such as speech recognition, medical diagnoses, and other instances where repeated fast classifications are required or where decision rules along coordinate axes facilitate understanding and communication of the model by practitioners in the field. Historically the key problem in tree-based modeling is deciding on the right size tree. This has been addressed by applying various stopping rules in the tree growing process, and more recently, by applying a pruning procedure to an overly large tree. Both approaches are intended to eliminate ‘over-fitting’ the data, especially as regards using the tree for prediction. The approach taken in this paper provides yet another way to protect against overfitting. As in the pruning case, we start with an overly large tree, but rather than cut off branches which seem to contribute little to the overall fit, we simply smooth the fitted values using a process called recursive shrinking. The shrinking process is parameterized by a scalar θ which ranges from zero to one. A value of zero implies shrinking all fitted values to that of the root of the tree, whereas a value of one implies no shrinking whatsoever. The shrinking parameter must be specified or otherwise selected on the basis of the data. We have used cross-validation to guide the choice in certain of the applications we have examined. Shrinking and pruning are qualitatively different although they tend to have similar predictive ability. We draw on analogies with the usual linear model to emphasize the differences as well as the similarities between the two methods. A comparison of shrinking and pruning on two data sets suggests that neither is dominant on strictly quantitative grounds. The qualitative difference between the two is that shrinking is ‘smoother’ and less sensitive to the specific choice of it’s tuning parameter. Pruning on ∗This is an unpublished AT&T technical memorandum, which formed the basis of the shrink.tree() software in Splus (“Statistical Models in S”, Chambers, J. and Hastie, T., eds, 1991, Chapman and Hall). This on-line version (March 2000) was made possible by Mu Zhu, who re-created the figures in the new Splus environment and made minor modifications to the original version.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultrametric Logarithm Laws I.

We announce ultrametric analogues of the results of Kleinbock-Margulis for shrinking target properties of semisimple group actions on symmetric spaces. The main applications are S-arithmetic Diophantine approximation results and logarithm laws for buildings, generalizing the work of Hersonsky-Paulin on trees.

متن کامل

Representing Directed Trees as Straight Skeletons

The straight skeleton of a polygon is the geometric graph obtained by tracing the vertices during a mitered offsetting process. It is known that the straight skeleton of a simple polygon is a tree, and one can naturally derive directions on the edges of the tree from the propagation of the shrinking process. In this paper, we ask the reverse question: Given a tree with directed edges, can it be...

متن کامل

Magnetohydrodynamics Fluid Flow and Heat Transfer over a Permeable Shrinking Sheet with Joule dissipation: Analytical Approach

A laminar, two dimensional, steady boundary layer Newtonian conducting fluid flow passes over a permeable shrinking sheet in the presence of a uniform magnetic field is investigated. The governing equations have converted to ordinary nonlinear differential equations (ODE) by using appropriate similarity transformations. The main idea is to transform ODE with infinite boundary condition into oth...

متن کامل

Differenced-Based Double Shrinking in Partial Linear Models

Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...

متن کامل

Temporal variation of microfibril angle in Eucalyptus nitens grown in different irrigation regimes.

In 1990, a 2-ha plantation of Eucalyptus nitens (Deane and Maiden) Maiden was established in southeastern Tasmania and subjected to different irrigation regimes. Point dendrometers were installed in March 1995 to monitor radial stem movement every 15 min over several growing seasons. In this study, data from two growing seasons (1996-1998) were considered. From these measurements, daily increme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990